9 research outputs found
Density estimation on an unknown submanifold
We investigate density estimation from a -sample in the Euclidean space , when the data is supported by an unknown submanifold of possibly unknown dimension under a reach condition. We study nonparametric kernel methods for pointwise and integrated loss, with data-driven bandwidths that incorporate some learning of the geometry via a local dimension estimator. When has H\"older smoothness and has regularity in a sense to be defined, our estimator achieves the rate and does not depend on the ambient dimension and is asymptotically minimax for . Following Lepski's principle, a bandwidth selection rule is shown to achieve smoothness adaptation. We also investigate the case : by estimating in some sense the underlying geometry of , we establish in dimension that the minimax rate is proving in particular that it does not depend on the regularity of . Finally, a numerical implementation is conducted on some case studies in order to confirm the practical feasibility of our estimators
Theoretical Foundations of Ordinal Multidimensional Scaling, Including Internal and External Unfolding
We provide a comprehensive theory of multiple variants of ordinal
multidimensional scaling, including external and internal unfolding. We do so
in the continuous model of Shepard (1966).Comment: same exact version with funding information adde
Estimating the Reach of a Manifold via its Convexity Defect Function
The reach of a submanifold is a crucial regularity parameter for manifold learning and geometric inference from point clouds. This paper relates the reach of a submanifold to its convexity defect function. Using the stability properties of convexity defect functions, along with some new bounds and the recent submanifold estimator of Aamari and Levrard [Ann. Statist. 47 177-–204 (2019)], an estimator for the reach is given. A uniform expected loss bound over a C^k model is found. Lower bounds for the minimax rate for estimating the reach over these models are also provided. The estimator almost achieves these rates in the C^3 and C^4 cases, with a gap given by a logarithmic factor
Inférence statistique sur des variétés inconnues
In high-dimensional statistics, the manifold hypothesis presumes that the data lie near low-dimensional structures, called manifolds. This assumption helps explain why machine learning algorithms work so well on high-dimensional data, and is satisfied for many real-life data sets.We present in this thesis some contributions regarding the estimation of two quantities in this framework: the density of the underlying distribution, and the reach of its support. For the problem of reach estimation, we suggest different strategies based on important geometric invariants — namely the convexity defect functions, and measures of metric distortions — from which we derive minimax-optimal rates of convergence. Regarding the problem of density estimation, we propose two approaches: one relying on the frequentist study of a kernel density estimator, and a Bayesian nonparametric approach based on location-scale mixtures of Gaussians. Both methods are shown to be optimal in most settings, and adaptive to the smoothness of the density. Lastly, we examine the behavior of some centrality measures in random geometric graph, the study of which, although unrelated to the manifold hypothesis, bears methodological and theoretical implications that can be of interest in any statistical framework.En statistique, l’hypothèse des variétés suppose que les données observées se répartissent autour de structures de faible dimension, appelées variétés. Ce postulat permet d’expliquer pourquoi les algorithmes d’apprentissage fonctionnent bien même sur des données en grande dimension, et est naturellement satisfait pour de nombreux jeux de données issus de la vie réelle. Nous présentons dans cette thèse quelques contributions aux problèmes d’estimation de deux quantités sous cette hypothèse : la densité de la distribution sous-jacente, et le reach de son support. Pour l’estimation du reach, nous élaborons des stratégies basées sur des invariants géométriques, avec d’une part la fonction de défaut de convexité, et d’autre part, des mesures de distortion métrique, desquels nous obtenons des vitesses de convergence optimales au sens minimax. Concernant l’estimation de la densité, nous proposons deux approches : l’une s’appuyant sur l’étude fréquentiste d’un estimateur à noyaux, et une approche bayésienne non-paramétrique se reposant sur des mélanges de gaussiennes. Nous montrons que ces deux méthodes sont optimales et adaptatives en la régularité de la densité. Enfin, nous examinons le comportement de certaines mesures de centralité dans des graphes aléatoires géométriques, l’étude duquel, bien que sans lien avec l’hypothèse des variétés, a des implications méthodologiques et théoriques qui peuvent être intéressantes dans tout cadre statistique
Estimating a density near an unknown manifold: a Bayesian nonparametric approach
We study the Bayesian density estimation of data living in the offset of an
unknown submanifold of the Euclidean space. In this perspective, we introduce a
new notion of anisotropic H\"older for the underlying density and obtain
posterior rates that are minimax optimal and adaptive to the regularity of the
density, to the intrinsic dimension of the manifold, and to the size of the
offset, provided that the latter is not too small -- while still allowed to go
to zero. Our Bayesian procedure, based on location-scale mixtures of Gaussians,
appears to be convenient to implement and yields good practical results, even
for quite singular data